feat: Implement two-sided verification check with check modes#487
feat: Implement two-sided verification check with check modes#487MikaelMayer merged 203 commits intomainfrom
Conversation
Implement the two-sided verification check design that distinguishes between 'always true', 'always false', 'indecisive', and 'unreachable' outcomes. Key changes: - Add checkSatAssuming to SMT Solver for assumption-based queries - Replace Outcome inductive with VCOutcome structure containing two SMT.Result fields - Add CheckMode enum (full/validity/satisfiability) to Options - Update encoder to emit two check-sat-assuming commands - Update SARIF output to handle nine possible outcome combinations - Default to validity mode for backward compatibility The two-sided check asks: 1. Can the property be true? (satisfiability check) 2. Can the property be false? (validity check) This enables distinguishing: - pass (sat, unsat): always true and reachable - refuted (unsat, sat): always false and reachable - indecisive (sat, sat): true or false depending on inputs - unreachable (unsat, unsat): path condition contradictory - Five partial outcomes when one check returns unknown Breaking change: VCResult API changed, all consumers must be updated. Tests need updating to reflect new default behavior (validity mode only). See TWO_SIDED_CHECK_IMPLEMENTATION.md for complete implementation details.
- Add CLI parsing for --check-mode flag (full/validity/satisfiability) - Remove deprecated --reach-check flag - Update help message with check mode documentation - Fix StrataVerify to use 'outcome' field instead of 'result' - Update emoji symbols for better visual distinction: - ✅ for pass (valid and reachable) - ✔️ for always true if reachable - ✖️ for refuted if reachable - ❌ for refuted (always false and reachable) - ⛔ for unreachable - 🔶 for indecisive - ➕ for satisfiable - ➖ for reachable and can be false
- Add metadata fields: fullCheck, validityCheck, satisfiabilityCheck - Add helper methods to check for these annotations - Update verifySingleEnv to check metadata before using global checkMode - Annotations override global --check-mode flag for specific statements
- Add VCOutcomeTests.lean with all 9 outcome combinations - Test both predicate methods and emoji/label rendering - Use named arguments for clarity - Update SMTEncoderTests to use full check mode for existing tests - Ensures backward compatibility with expected 'pass' outcome
- Add VCOutcomeTests.lean with all 9 outcome combinations - Each test shows emoji and label in output for easy verification - Use named arguments for clarity - Update SMTEncoderTests to use full check mode for existing tests - Ensures backward compatibility with expected 'pass' outcome
- Add VCOutcomeTests.lean with all 9 outcome combinations - Use formatOutcome helper to avoid repetition - Each test shows emoji and label in output - Use named arguments for clarity - Update SMTEncoderTests to use full check mode - Ensures backward compatibility with expected 'pass' outcome
- Document CLI flag integration - Document per-statement annotations - Document emoji updates - Document comprehensive test suite - Document test fixes for backward compatibility
- Fix StrataVerify to properly format Except String VCOutcome - Update StrataMain to use vcResult.outcome instead of vcResult.result - Use isRefuted/isRefutedIfReachable predicates for failure detection - Format outcomes with emoji and label
Clarifies that refuted outcome means reachable and always false
…ters - Rename isRefuted -> isRefutedAndReachable - Rename isIndecisive -> isIndecisiveAndReachable - Rename isRefutedIfReachable -> isAlwaysFalseIfReachable - Add backward compatibility aliases - Add cross-cutting predicates: isAlwaysFalse, isAlwaysTrue, isReachable - Enables filtering outcomes by properties across multiple cases
…ariants - isPass: true if validityProperty is unsat (always true), regardless of reachability - isPassAndReachable: true if (sat, unsat) - proven reachable and always true - isPassIfReachable: true if (unknown, unsat) - always true if reachable - Update label/emoji to use isPassAndReachable and isPassIfReachable - Update test comments to reflect new naming - Add backward compatibility alias isAlwaysTrueIfReachable
…overs all sat cases - isSatisfiable: true for any sat satisfiabilityProperty - isSatisfiableValidityUnknown: specific case (sat, unknown) - Rename isPassIfReachable -> isPassReachabilityUnknown - Rename isAlwaysFalseIfReachable -> isAlwaysFalseReachabilityUnknown - Rename isReachableAndCanBeFalse -> isCanBeFalseAndReachable - All predicates now have reachability info at the end - Add backward compatibility aliases for all old names
- Nine base cases without 'is': passAndReachable, refutedAndReachable, etc. - Derived predicates with 'is': isPass, isSatisfiable, isReachable, etc. - Base cases represent exact outcome combinations - Derived predicates check properties across multiple outcomes - Update SarifOutput to use base cases in outcomeToLevel/outcomeToMessage - Update label/emoji functions to use base cases - Maintain backward compatibility aliases for all old names
- Add VerificationMode enum: deductive vs bugFinding - Deductive mode: only pass is success, anything not proven is error/warning - Bug finding mode: refuted is error, unknown is acceptable warning - Group outcomes by severity (one .none, one .error, one .warning, one .note per mode) - Default to deductive mode for backward compatibility
…e isAlwaysFalse - Deductive mode: only pass/unreachable are success/note, everything else is error - Bug finding mode: use isAlwaysFalse predicate instead of listing base cases - Cleaner and more maintainable
…achable is warning in deductive - Consistent naming: use 'alwaysFalse' instead of 'refuted' in base cases - Deductive mode: unreachable is warning (dead code detection) - Update all references in Verifier.lean and SarifOutput.lean - Maintain backward compatibility aliases
- Replace isAlwaysFalse with explicit base cases: alwaysFalseAndReachable, alwaysFalseReachabilityUnknown - Add comment listing all error cases in deductive mode - Clearer mapping from base cases to severity levels
- Remove 'Verification succeeded/failed' language - Use neutral descriptions: 'Always true and reachable', 'Always false and reachable' - Messages work for any property type (assertion, invariant, requires, etc.) - Shorter and clearer messages
…nknown outcomes - alwaysFalseReachabilityUnknown has validityProperty = unknown (not sat), no counterexample - unknown outcome can have models from either satisfiability or validity property - Show models from both properties when available for unknown outcome
- alwaysFalseReachabilityUnknown has validityProperty = unknown (no model) - unknown outcome also has no models (Result.unknown carries no data) - Only Result.sat carries counterexample models
…rties - Eliminates redundant predicate checks in outcomeToMessage - Single exhaustive match covers all 9 base cases plus error cases - More concise and easier to verify correctness
- Test predicates, messages, and severity levels for each outcome - Verify deductive and bug finding modes produce correct SARIF levels - Self-contained test outputs with no numbered comments - Tests ensure SARIF output matches predicate semantics
- Add missing validityCheck parameter (now takes satisfiabilityCheck and validityCheck) - Use Except.ok/Except.error to avoid ambiguity
|
I will review this by tomorrow morning. |
ok, thanks! |
- Remove unused _md from Imperative.SMT.dischargeObligation - Refactor outcomeToLevel to use cleaner if-else structure - Remove backward compatibility aliases not in main - Fix outdated comment in maskOutcome - Use _ instead of _satResult in Verify.lean
… sat, val) Addresses reviewer comment to use direct pattern matching instead of nested boolean logic. Each case is now a single match arm.
Summary
Replaces the single-sided
reachCheckflag with a two-sided verification framework using orthogonal check mode and check amount flags. Each proof obligation now produces aVCOutcomewith independent satisfiability and validity properties, enabling richer diagnostic feedback.Problem
We want to perform richer checks on assert statements beyond simple validity. Covers are existential checks where forking into two means the results are linked by an OR, so they are not suitable for detecting assertions that surely fail along a path. To find such failures, checks must be encoded as assertions, and we need extended diagnostics for them.
A previous PR opened the way by adding a reachability check, demonstrating that two checks per command are feasible. However, the reachability check missed an important case for bug-finding mode: from reachability + validity alone, we cannot derive the result of reachability + satisfiability. By testing both
P ∧ Q(satisfiability) andP ∧ ¬Q(validity) wherePis the path condition andQis the property, we get two checks that together determine the validity and satisfiability ofQgivenP, and also derive reachability.Solution
Two orthogonal flags replace
reachCheck:--check-mode):deductive(default) orbugFinding--check-amount):minimal(default) orfullA per-statement
@[fullCheck]annotation can override the global check amount.Possible outcomes by mode
Default mode (
deductive,minimal): validity check only for asserts, satisfiability check only for covers.For assert statements (validity only, satisfiability masked to unknown):
For cover statements (satisfiability only, validity masked to unknown):
Bug-finding mode (
bugFinding,minimal): satisfiability check only for all statement types. Same as the cover table above.Full mode (
full): both checks run, all 9 outcomes possible. The last two columns show the error reporting level in SARIF output for each mode (✅ = pass, 🔴 = error, 🟡 = warning, 🔵 = note).P ∧ QP ∧ ¬QTesting
All existing tests updated. New tests cover the full outcome matrix including per-statement
@[fullCheck]annotations. BoogieToStrata integration tests, Python analysis tests, and SARIF output all updated.